192 research outputs found

    On the discovery of social roles in large scale social systems

    Get PDF
    The social role of a participant in a social system is a label conceptualizing the circumstances under which she interacts within it. They may be used as a theoretical tool that explains why and how users participate in an online social system. Social role analysis also serves practical purposes, such as reducing the structure of complex systems to rela- tionships among roles rather than alters, and enabling a comparison of social systems that emerge in similar contexts. This article presents a data-driven approach for the discovery of social roles in large scale social systems. Motivated by an analysis of the present art, the method discovers roles by the conditional triad censuses of user ego-networks, which is a promising tool because they capture the degree to which basic social forces push upon a user to interact with others. Clusters of censuses, inferred from samples of large scale network carefully chosen to preserve local structural prop- erties, define the social roles. The promise of the method is demonstrated by discussing and discovering the roles that emerge in both Facebook and Wikipedia. The article con- cludes with a discussion of the challenges and future opportunities in the discovery of social roles in large social systems

    Seasonality in Dynamic Stochastic Block Models

    Full text link
    Sociotechnological and geospatial processes exhibit time varying structure that make insight discovery challenging. This paper proposes a new statistical model for such systems, modeled as dynamic networks, to address this challenge. It assumes that vertices fall into one of k types and that the probability of edge formation at a particular time depends on the types of the incident nodes and the current time. The time dependencies are driven by unique seasonal processes, which many systems exhibit (e.g., predictable spikes in geospatial or web traffic each day). The paper defines the model as a generative process and an inference procedure to recover the seasonal processes from data when they are unknown. Evaluation with synthetic dynamic networks show the recovery of the latent seasonal processes that drive its formation.Comment: 4 page worksho

    Realistic Traffic Generation for Web Robots

    Full text link
    Critical to evaluating the capacity, scalability, and availability of web systems are realistic web traffic generators. Web traffic generation is a classic research problem, no generator accounts for the characteristics of web robots or crawlers that are now the dominant source of traffic to a web server. Administrators are thus unable to test, stress, and evaluate how their systems perform in the face of ever increasing levels of web robot traffic. To resolve this problem, this paper introduces a novel approach to generate synthetic web robot traffic with high fidelity. It generates traffic that accounts for both the temporal and behavioral qualities of robot traffic by statistical and Bayesian models that are fitted to the properties of robot traffic seen in web logs from North America and Europe. We evaluate our traffic generator by comparing the characteristics of generated traffic to those of the original data. We look at session arrival rates, inter-arrival times and session lengths, comparing and contrasting them between generated and real traffic. Finally, we show that our generated traffic affects cache performance similarly to actual traffic, using the common LRU and LFU eviction policies.Comment: 8 page

    Accurate Local Estimation of Geo-Coordinates for Social Media Posts

    Get PDF
    Associating geo-coordinates with the content of social media posts can enhance many existing applications and services and enable a host of new ones. Unfortunately, a majority of social media posts are not tagged with geo-coordinates. Even when location data is available, it may be inaccurate, very broad or sometimes fictitious. Contemporary location estimation approaches based on analyzing the content of these posts can identify only broad areas such as a city, which limits their usefulness. To address these shortcomings, this paper proposes a methodology to narrowly estimate the geo-coordinates of social media posts with high accuracy. The methodology relies solely on the content of these posts and prior knowledge of the wide geographical region from where the posts originate. An ensemble of language models, which are smoothed over non-overlapping sub-regions of a wider region, lie at the heart of the methodology. Experimental evaluation using a corpus of over half a million tweets from New York City shows that the approach, on an average, estimates locations of tweets to within just 2.15km of their actual positions.Comment: In Proceedings of the 26th International Conference on Software Engineering and Knowledge Engineering, pp. 642 - 647, 201

    Visual Entailment: A Novel Task for Fine-Grained Image Understanding

    Get PDF
    Existing visual reasoning datasets such as Visual Question Answering (VQA), often suffer from biases conditioned on the question, image or answer distributions. The recently proposed CLEVR dataset addresses these limitations and requires fine-grained reasoning but the dataset is synthetic and consists of similar objects and sentence structures across the dataset. In this paper, we introduce a new inference task, Visual Entailment (VE) - consisting of image-sentence pairs whereby a premise is defined by an image, rather than a natural language sentence as in traditional Textual Entailment tasks. The goal of a trained VE model is to predict whether the image semantically entails the text. To realize this task, we build a dataset SNLI-VE based on the Stanford Natural Language Inference corpus and Flickr30k dataset. We evaluate various existing VQA baselines and build a model called Explainable Visual Entailment (EVE) system to address the VE task. EVE achieves up to 71% accuracy and outperforms several other state-of-the-art VQA based models. Finally, we demonstrate the explainability of EVE through cross-modal attention visualizations. The SNLI-VE dataset is publicly available at https://github.com/ necla-ml/SNLI-VE

    Finding Street Gang Members on Twitter

    Full text link
    Most street gang members use Twitter to intimidate others, to present outrageous images and statements to the world, and to share recent illegal activities. Their tweets may thus be useful to law enforcement agencies to discover clues about recent crimes or to anticipate ones that may occur. Finding these posts, however, requires a method to discover gang member Twitter profiles. This is a challenging task since gang members represent a very small population of the 320 million Twitter users. This paper studies the problem of automatically finding gang members on Twitter. It outlines a process to curate one of the largest sets of verifiable gang member profiles that have ever been studied. A review of these profiles establishes differences in the language, images, YouTube links, and emojis gang members use compared to the rest of the Twitter population. Features from this review are used to train a series of supervised classifiers. Our classifier achieves a promising F1 score with a low false positive rate.Comment: 8 pages, 9 figures, 2 tables, Published as a full paper at 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2016

    Visual Entailment Task for Visually-Grounded Language Learning

    Get PDF
    We introduce a new inference task - Visual Entailment (VE) - which differs from traditional Textual Entailment (TE) tasks whereby a premise is defined by an image, rather than a natural language sentence as in TE tasks. A novel dataset SNLI-VE (publicly available at https://github.com/necla-ml/SNLI-VE) is proposed for VE tasks based on the Stanford Natural Language Inference corpus and Flickr30k. We introduce a differentiable architecture called the Explainable Visual Entailment model (EVE) to tackle the VE problem. EVE and several other state-of-the-art visual question answering (VQA) based models are evaluated on the SNLI-VE dataset, facilitating grounded language understanding and providing insights on how modern VQA based models perform.Comment: 4 pages, accepted by Visually Grounded Interaction and Language (ViGIL) workshop in NeurIPS 201

    A Broad Evaluation of the Tor English Content Ecosystem

    Full text link
    Tor is among most well-known dark net in the world. It has noble uses, including as a platform for free speech and information dissemination under the guise of true anonymity, but may be culturally better known as a conduit for criminal activity and as a platform to market illicit goods and data. Past studies on the content of Tor support this notion, but were carried out by targeting popular domains likely to contain illicit content. A survey of past studies may thus not yield a complete evaluation of the content and use of Tor. This work addresses this gap by presenting a broad evaluation of the content of the English Tor ecosystem. We perform a comprehensive crawl of the Tor dark web and, through topic and network analysis, characterize the types of information and services hosted across a broad swath of Tor domains and their hyperlink relational structure. We recover nine domain types defined by the information or service they host and, among other findings, unveil how some types of domains intentionally silo themselves from the rest of Tor. We also present measurements that (regrettably) suggest how marketplaces of illegal drugs and services do emerge as the dominant type of Tor domain. Our study is the product of crawling over 1 million pages from 20,000 Tor seed addresses, yielding a collection of over 150,000 Tor pages. We make a dataset of the intend to make the domain structure publicly available as a dataset at https://github.com/wsu-wacs/TorEnglishContent.Comment: 11 page
    corecore